Investigating the content and form of referring expressions in Mandarin: introducing the Mtuna corpus

نویسندگان

  • Kees van Deemter
  • Le Sun
  • Rint Sybesma
  • Xiao Li
  • Chen Bo
  • Muyun Yang
چکیده

East Asian languages are thought to handle reference differently from English, particularly in terms of the marking of definiteness and number. We present the first Data-Text corpus for Referring Expressions in Mandarin, and we use this corpus to test some initial hypotheses inspired by the theoretical linguistics literature. Our findings suggest that function words deserve more attention in Referring Expression Generation than they have so far received, and they have a bearing on the debate about whether different languages make different trade-offs between clarity and brevity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Refinement Algorithms for the Generation of Referring Expressions

We propose an algorithm for the generation of referring expressions (REs) that adapts the approach of Areces et al. (2008, 2011) to include overspecification and probabilities learned from corpora. After introducing the algorithm, we discuss how probabilities required as input can be computed for any given domain for which a suitable corpus of REs is available, and how the probabilities can be ...

متن کامل

Spatial Descriptions as Referring Expressions in the MapTask Domain

We discuss work-in-progress on a hybrid approach to the generation of spatial descriptions, using the maps of the Map Task dialogue corpus as domain models. We treat spatial descriptions as referring expressions that distinguish particular points on the maps from all other points (potential ‘distractors’). Our approach is based on rule-based overgeneration of spatial descriptions combined with ...

متن کامل

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...

متن کامل

The Effect of Rootstocks on the Peel Phenolic Compounds of Clementine Mandarin (Citrus clementina)

Studies have shown that phenolic compounds are important in human health.The purpose of this research was to examine the influence of rootstocks on phenolic compounds. The content of individual phenolic compounds in fruits was determined by HPLC. Total flavonoids content was measured using colorimetric method. Free radical scavenging activity on stable DPPH radicals was also evaluated. HPLC ana...

متن کامل

Individual Variation in the Choice of Referential Form

This study aims to measure the variation between writers in their choices of referential form by collecting and analysing a new and publicly available corpus of referring expressions. The corpus is composed of referring expressions produced by different participants in identical situations. Results, measured in terms of normalized entropy, reveal substantial individual variation. We discuss the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017